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Abstract 

Two cluster algorithms, based on constructing and flipping loops, are pre- 
sented for worldline quantum Monte Carlo simulations of fermions and are 
tested on the one-dimensional repulsive Hubbard model. We call these algo- 
rithms the loop-flip and loop-exchange algorithms. For these two algorithms 
and the standard worldline algorithm, we calculated the autocorrelation times 
for various physical quantities and found that the ordinary worldline algo- 
rithm, which uses only local moves, suffers from very long correlation times 
that makes not only the estimate of the error difficult but also the estimate of 
the average values themselves difficult. These difficulties are especially severe 
in the low-temperature, large-?/ regime. In contrast, we find that new algo- 
rithms, when used alone or in combinations with themselves and the standard 
algorithm, can have significantly smaller autocorrelation times, in some cases 
being smaller by three orders of magnitude. The new algorithms, which use 
non-local moves, are discussed from the point of view of a general prescription 
for developing cluster algorithms. The loop-flip algorithm is also shown to be 
ergodic and to belong to the grand canonical ensemble. Extensions to other 
models and higher dimensions is briefly discussed. 

(To appear in Phys. Rev. B) 
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I. INTRODUCTION 



The worldline quantum Monte Carlo method is frequently used by condensed matter 
and field theorists to simulate lattice models of systems of interacting fermions, bosons, and 
quantum spins 0. This method has become a textbook example of a quantum Monte 
Carlo method as one of its virtues is its simplicity. Another virtue is its production of 
worldline patterns that often pictorially represent the imaginary-time quantum dynamics of 
the model. Several difficulties with the method are also well known. 

The most notable difficulty j3| , which the worldline method shares with almost all other 
quantum Monte Carlo methods, is a sign problem which is manifested by Monte Carlo 
transition probabilities becoming negative. Typically, this difficulty renders the method 
ineffective. A difficulty || more unique to the worldline method is the lack of ergodicity as 
in practice the winding number of the worldlines is conserved and thereby the sampling of 
phase space is restricted. The winding number conservation corresponds to the conservation 
of fermion number and thus places the method in the canonical ensemble. 

A less appreciated and infrequently studied difficulty is the worldline method's very long 
auto-correlation time between measurements of physical quantities. These long times make 
error estimation for these quantities difficult and can cause long computer runs. The main 
purpose of this paper is to illustrate the extraordinary lengths these times can take and 
to present two new ways of implementing the worldline method that in many cases reduce 
these times by several orders of magnitude. The new methods are a significant improvement 
in efficiency. One method is also naturally ergodic and grand canonical as winding number 
is not conserved. 

With improved efficiency, the new methods generate significant reductions in the variance 
of the calculated results for a fixed amount of computing time. Since the sign problem is 
usually accompanied by a dramatic increase in the variance of the measured quantities, the 
extensions potentially mitigate the sign problem. They do not, however, address it directly. 
Oddly, some combinations of the new methods permit a sign problem to occur in simulations 
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for which the standard implementations have no sign problem. We found that this was a 
minor problem, occurring infrequently only at high temperature in very small systems and 
decreasing as the lattice size was increased. Constructing still other methods that conserve 
winding number is possible; however, allowing the winding number to change often appears 
to be a key ingredient for improved performance. 

We will present our new algorithms by discussing their application to worldline sim- 
ulations of the one-dimensional repulsive Hubbard model. The Hubbard model is one of 
the simplest models of interacting electrons, and for it we will study the auto-correlation 
time among measurements of the different physical quantities germane to its interesting 
physics. In Section II, we define the model and give a brief description of the computation 
of its thermodynamic properties from a path-integral. Here, we will also define and discuss 
how we measured the auto-correlation times. To establish notation and make subsequent 
discussion reasonably self-contained, we also present the particular implementation of the 
worldline method used in our studies. This implementation focuses on Monte Carlo moves 
that change the state of plaquettes defined on the space-time lattice on which the worldlines 
exist. 

In Section III, we present our two new worldline methods, which we call the loop-flip 
and loop-exchange methods. The loop-flip method is an extension of the work of Evertz 
et al. |4| who developed a loop-flip algorithm to reduce critical slowing down in 6-vertex 
model simulations. In that work, they observed that worldline simulations of quantum spin 
systems have similarities to simulations of the 6-vertex model and suggested the potential 
utility of their algorithm for simulating such systems. We will discuss the extensions of the 
procedure necessary for simulating fermion systems. We also present our second method, the 
loop-exchange algorithm. For fermions systems, we found it very useful to exchange portions 
of up and down spin worldlines to accelerate the sampling of phase space. This exchange 
was designed to overcome the difficulty of moving up and down spin worldlines across one 
another because of the Coulomb repulsion between the fermions that exists in the Hubbard 
model. Our two new algorithms, in sharp contrast with the standard implementations of 
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the worldline method, use non-local (global) updating moves. These moves generalize the 
cluster algorithms recently developed to reduce long auto-correlation times accompanying 
simulations of critical phenomena in classical spin systems ||. Here, we are not concerned 
with critical phenomena, but rather we are reducing the inherently long auto-correlation 
times that occur in the worldline method even when the physical system is far removed 
from any known phase transition. 

We investigate in Section IV the effectiveness of various combinations of the algorithms 
for computing several different properties of the repulaive Hubbard model. For a stan- 
dard version of the worldline method, which we call the plaquette-flip algorithm, the auto- 
correlation time was too long to measure in most cases, even with the use of relatively long 
computer runs. In these cases, reliable error estimation is very difficult. Our new algo- 
rithms did not suffer from this problem. Finally, in Section V, we summarize our findings 
and discuss important issues awaiting further investigation, particularly for doing models 
other than the Hubbard model and for studying models in higher dimensions. 

II. BACKGROUND AND DEFINITIONS 

A. Hubbard Model 

The one-dimensional Hubbard Hamiltonian is 

H = X)tf M+1 = E[ T »+i + \{Vi + V i+1 )} (2.1) 



where 



T M+1 = -tJ2(4,*£i+l,<7 + 4+l,aCi,a) (2.2) 



and 



Vi = J2[\u(n i!(T - i)K_ CT - ±) - fi( ni , a - 1)] (2.3) 

Here, c| CT and q jCr are the creation and destruction operators for a fermion at site i with spin 
cr (up or down) and = c\ a Ci j(T is the fermion number operator at site i for spin a. The 



first term in ( |2.1| ) is the kinetic energy of the electrons and describes their hopping, without 
spin flip, from site to site. The second term is the potential energy (Coulomb interaction) 
that exists only if two electrons occupy the same site and includes the chemical potential 
H term for convenience. We took U > 0. The model is defined in such a way that when \x 
equals zero, the model exhibits particle-hole symmetry. With this symmetry, there are an 
equal number of up and down spin electrons and their sum equals the number of lattice sites 
iV (the half-filled case). We assume periodic boundary conditions. 

At half-filling, the system is an insulator, with anti-ferromagnetic spin fluctuations dom- 
inating charge- density wave fluctuations. At other fillings, it is metallic. Increasing the 
parameter U supresses the charge fluctuations and enhances the spin fluctuations. The 
following correlation functions are useful descriptors of these two types of fluctuations |J, 

T X ±(q) = "^7 [ P drJ2e^ k ((n j+kA (T) ± n i+M (r))(n iiT (0) ± n i4 (0)> (2.4) 
piv jo j>k 

They measure static spin-density wave (SDW) and charge-density wave (CDW) correla- 
tions. Simpler measurable quantities include the energy E = (H) and the average electron 
occupancy per site, 

n = ^ [ dr^n^r) + n 3i {r))) (2.5) 

As we discuss below, we compute these quantities as functions of imaginary-time r and their 
definitions reflect an average over this parameter. 

In ( |2.4j) and (|2.5|) , the symbol (• • •) denotes the finite-temperature expectation value of 
some physical observable and is defined by 

(A) = TrAe~P H /Z (2.6) 

where (3 = 1/kT is the inverse temperature, Z = Tr e~^ H is the partition function of the 
system, and Tr denotes the trace operation. The basic idea of the worldline method is to 
express the trace operation as a path-integral in imaginary-time and then use Monte Carlo 
techniques to evaluate the resulting multi-dimensional functional integral numerically. 
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B. Path-Integrals 



To develop the path-integral, we rewrite the partition function of the system by dividing 
the imaginary-time interval 0<t</3 into L slices of width r = /3/L and inserting complete 
sets of states \s) at each time-slice: 

Z = E(*il e ~ r "l^> ■ ■ • (s3\e- TH \s 2 )(s 2 \e- TH \ Sl ) (2.7) 

where the sum is over the set {sj} = {si, s 2 , . . . , sl} of all states Sj. The subscript on the 
identifier s of the different members of the complete set marks the time-slice at which the 
states were inserted. The properties of the trace require the propagation in imaginary-time 
to be periodic. 

In order to simplify the evaluation of the matrix elements in (|2.7|) , we rewrite the full 
Hamiltonian as the sum of two easily diagonalizable, but not necessarily commuting, pieces, 
H = Hi + H 2 . Using one form of the Suzuki- Trotter approximation |7[], we may then 
approximate the imaginary-time evolution operator e~ rH by 

e~ rH « e- TH2 e- rHl (2.8) 



When this approximation is used in the expression for the partition function ( |2.7| ) and 
additional intermediate states are inserted, the resulting expression for the partition function 
is 

Z = J2( s i\ e ~ TH2 \ s 2L)---(s4\e- THl \s 3 ) 

{si} 

x(sz\e- TH2 \s 2 )(s 2 \e- TH i\ Sl ) (2.9) 

In Section [II Cj , we will give specific choices of Hi and \s) for the Hubbard model. Here, we 
are concerned only with the general framework of the method. 

We now see that the partition function ( |2.9| ) may be re-expressed as a functional integral 
over all possible configurations C = {si}. To evaluate the path integral, Monte Carlo 
importance sampling is used. This method generates a sequence of configurations C{ (see 
below), which, in the infinite-sequence limit, conforms to the probability distribution 



P(C) = ( Sl \e- TH2 \s 2L ) ■ ■ ■ (s 3 \e' TH2 \s 2 )(s 2 \e-^\ Sl )/Z 

= W(C)/Z (2.10) 

Such a sequence may be generated by beginning with an arbitrary (allowed) worldline con- 
figuration Co and then considering some modified configuration C' . One then computes the 
probabilities for these configurations, P(C ) and P(C' Q ), and accepts the modified configura- 
tion with probability R for which we choose 

P(C>) W(C>) 
P(C ) + P(C ) W(C ) + W(C ) { • ' 

If the modification is accepted, the new configuration will be C\ = C' ; otherwise the old 
configurations will be retained: C\ = Cq. The procedure is then iterated to produce C 2 , C3, 
etc. 



C. Worldlines 

For fermion lattice models, the standard choice for the complete set of states |s) is the 
occupation number basis formed from the state \nx^, n^, ■ ■ ■ , ^ n ,T)l n i,J.) n 2,i, ■ ■ ■ , n N,i) by 
allowing each lattice site to assume all possible occupancies of up and down electrons [f|. 
The splitting of the Hamiltonian H = Hi + H 2 is usually done by having H\ and H 2 refer 
to the odd and even lattice sites 0] . With these choices, the product of the matrix elements 
in the expression ( |2.9|) for the partition function factorizes into products of matrix elements 
defined on shaded squares (plaquettes) of up and down spin checkerboards. The summation 
over all states becomes a summation over all combinations of occupancies n u il for each spin. 
Here, i labels the spatial (real space) position and I, the temporal (imaginary-time) position. 
A shaded plaquette for a given spin is one with the four sites (i, I), (i+ 1, /), (i, / + 1), and 
(i + 1, 1 + 1) where i and / are either both even or both odd. A worldline for a given spin is 
a continuous line constructed from straight line segments connecting occupied sites. A one- 
to-one correspondence between fermions and worldlines can be made if two parallel vertical 
(temporal) lines are assigned to a shaded plaquette with four occupied sites. These concepts 
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are illustrated in Fig. [T] which shows the checkerboard and a typical worldline configuration 
for one spin component of a system of 4 electrons on an 8 site lattice. 
Because of the checkerboarding, we can write the partition function as 



Z = £ Bgn(K,})W({T^}) 



K.J 



(2.12) 



^(KJ) = nn^ 

Z=l i=l 



/ 



2i-l,2Z 



n™ 



2i,2Z 



2i-l,2J-l n 2i,2l-l 



T 



n 2i-l,2l n 2i,2l 
n 2i-l,2l-l n 2i,2l-l j 



XW 



( „T n T 

2i,2Z+l '*'2i+l,2i+l 

,T 



n 



2i,2Z+l n 2i+l,2/-l 



n 2i,2« U 2i+l,2l J 



(2.13) 



V n 2i,2Z n 2«+l,2Z 

The arguments of the functions on the right-hand side are written in a form that emphasizes 
the plaquette structure for both up and down spins. Clearly, the weight (|2.13|) reduces to a 
product of weights for only the shaded plaquettes. If we further choose 



(2-14) 



then we can express the local weights as 
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(2.16) 



and 



Wy 



T T 
n{ n 2 



T T 
n 3 n 4 



ni n 1 



1 "-2 



n 3 n 4 



i=i 



(2.17) 



where 



w v {n, n') = e -i{u( n —2)( n '—2)-^ n + n '- 1 )] (2.18) 



In ( 2.16 ), we see the kinetic energy's contribution to the weight is non- vanishing only for 
certain sets of occupancies. In fact, out of the 2 4 = 16 possible plaquettes for either an up 
or a down spin, only 6 plaquettes satisfy local fermion conservation and these are shown in 
Fig. @. 

The manifold on which the worldlines are defined is a two-dimensional torus since we 
have a periodic boundary condition for both spatial and temporal directions. Accordingly, 
on this torus, we can define spatial and temporal winding numbers for any closed loop, 
for example, a worldline. To be more specific, to define a spatial winding number for a 
given loop, let us first suppose we have a spatial cut, i.e., a vertical line in Fig.l which 
starts somewhere at the bottom and ends at the top of the checkerboard. (The location 
of the cut does not matter since it does not affect the definition.) Then, if we trace the 
entire path of the loop and it passes a spatial cut in one direction m times and in the 
opposite direction n times, then the spatial winding number of this loop is defined to be 
m — n. The total spatial winding number of worldlines is defined as the sum of the spatial 
winding numbers of all the worldlines. A temporal winding number of a given loop and 
the total temporal winding number of worldlines are defined in the same fashion but with 
the temporal cut being a horizontal line in the checkerboard. In what follows, when we 
say spatial or temporal winding number, we will mean the total spatial or temporal winding 
number of the worldlines. 

The sign in ( |2.12|) is defined in terms of the worldlines 



sgn(K,}) = (-l)£,<*«-i) (2.19) 

where Z\ is the temporal winding number of the /-th worldline and the sum is taken over 
all the worldlines. Its origin is the fermion anti-commutation. For the one-dimensional 
Hubbard model, the standard worldline algorithms have no sign problem as Zi = 1 and is 
conserved. For the new algorithms we will be proposing, this will not be the case. Whenever 
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a sign problem occurred in our simulations, we treated it in the ordinary way ||; that is, 
when we have configurations of negative sign, we computed thermodynamic averages by 

(A) = £ sgn(C) A(C)/ £ sgn(C) (2.20) 
c c 

By using this formula, we do the Monte Carlo simulation with the following weight 

Z' = £ W({nl}). (2.21) 

that ignores the sign factor. 



D. The plaquette-fiip algorithm (Algorithm P) 

The worldline method can be implemented in several different ways. We will now high- 
light the main points of an implementation which focuses on Monte Carlo moves that change 
the local fermion occupancies of plaquettes defined on a checkerboarding of imaginary-time 
and space. This implementation leads to efficient algorithms not only on serial computers 
but also on vector and parallel machines PHTT . 



To generate different sequences of worldlines, one uses the observation [H] that only one 
basic movement of a worldline segment can occur: If an unshaded plaquette has a worldline 
segment on one vertical edge and none on the other, the allowed movement is vacating 
the one edge and occupying the other by moving the worldline segment across the unshaded 
plaquette. This movement is illustrated in Fig. |3|. Algorithm P accomplishes this by visiting 
in sequence each unshaded plaquette for both the up and down spin checkerboards and 
attempting a local move that flips all four corners of the unshaded plaquette. Flipping a 
corner means replacing the value of the variable riij by 1 — riij. The configuration weights 
W{C) and W(C) are determined by computing the product of the weights (|2.15|) for the four 



neighboring shaded plaquettes for both the old and flipped configuration. These weights are 



then used in ( 2.11 ) to determine whether the flip is accepted or not. An important point is 



that we need to know only local states (the occupancies of the shaded plaquettes surrounding 



11 



the unshaded one) to calculate this probability rather than the state of the whole system. 
This is the case because the attempted flip changes the configuration only locally since the 
Boltzmann weight ( |2.15| ) is factorized into a product of local weights . 



In algorithm P, the winding number of any worldline in any direction, temporal or 
spatial, is conserved. We found, however, that this broken ergodicity, as expected ||, has 
only a small effect on the simulation, especially for larger lattices. 

E. Auto-correlation times 

The basic property of the Monte Carlo method is to replace the thermodynamic average 
( |2.6|) by the average of A(t) over M Monte Carlo steps: 

1 M 

(A)MA = —52A(t) (2.22) 

m t=i 

where A(t) is the value of a physical quantity A (energy, SDW, etc.) at Monte Carlo step 
t. If M is large enough and the A(t) : s are statistically independent estimates of A, then the 
error estimate for A would be a / yM where 

1 M 

a 2 - 



1 t=i 



M 

To measure the degree of statistical correlations, we calculated two kinds of quantities. 
The first one is an integrated autocorrelation time defined by 

1 oo 

^t = -^ + E^W- (2-24) 
z t=o 

where 

r (f) = (A(t + t)A(t ))-(A(t ))(A(t )) 
A{ ] (A(to)A(t ))-(A(t ))(A(t )) ■ 1 • > 

In the actual simulation, r^(£) is approximated by 

r fA ih ^ + t)A(i) - ^ [gg A(i)] [gg^ A(i ti 
r A{t) » 1 v~> m r^^ a,\ TTi — ~ (2-26) 



±ZZilA(i)A(i)-AA] 
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The variance a 2 nt of data correlated with the time r^ t is related to the variance a 2 computed 
from ( |2.23| ) in the absence of correlations by Jl2[ 



af nt « 2r> 2 (2.27) 

Thus, in the presence of correlations, 2r^ t more simulation steps are needed in order to 

achieve the same variance of a measured quantity. Accordingly, error estimates computed 

when correlations are unknowingly present are always smaller than the actual error. 

Typically, the auto-correlation function is not a simple exponential function in time. 

This fact sometimes makes estimating r^ t by ( |2.24|) difficult. Another estimate of r^ t is 

obtained by examining what it takes to produce statistically independent measurements. 

We grouped the M measurements into n bins of length I = Mj n, and for a sequence of bin 

lengths (chosen to be 2, 4,8, . . .), we computed the bin averages A h {l) of the A(t) 

i u 

A b (l) = - £ A{t). (2.28) 
t=(b-i)i+i 

and the variance of these averages 

I n _ 

a{lf = rE(A(0-^) 2 (2-29) 

This variance should become inversely proportional to I as the bin size / becomes large 
enough so that the A^l) as a function of b become statistically independent [|nj. When 
statistical independence is approached, the quantity 

rLil) = ^ (2-30) 



where a 2 is computed from ( |2.23|) , approaches a constant as a function of / [ 13| . This 
constant is our estimate for r^ t . The error estimate becomes a{l*)/ ^/n where /* is the value 
of I at which the constant is reached. We note that with the use of ( |2.30| ), we can rewrite 



this error estimate as a^2r int /M and obtain the expected result ( |2. 27| ) . We also note that 
we have defined T^ t (l) so that r^(l) = ~. 

Generally, T^ t (l) is an increasing function of /. This increase, however, ceases when / 
becomes much larger than the auto-correlation time r^ t . As we discuss below, we found 
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cases where, in spite of a very long computer run, T^ t (l) keeps increasing as function of /. In 
such cases, we estimated a lower bound for r^ t by selecting the value T^ t (l) for the largest 
available I. 

A different measure of auto-correlation time we used is the exponential auto-correlation 
time r exp which by definition is the largest autocorrelation time associated with a given 
algorithm. We determined r exp from the t — > oo asymptotic behavior of 

T A (t) ~ j A exp(-t/T exp ), (2.31) 

by fitting a straight line to logT A (t). We remark that in practice, the most stable estimates 
of r^ t were obtained by summing r^(t) in ( 2.24 ) only up to some finite t = W, with W of 



the order of r exp , and computing the contribution of t > W from (|2.31|) . 

The auto-correlation time r exp corresponds to the second largest eigenvalue of the tran- 
sition probability matrix W(C — > C) for changing configuration C to C. (The largest 



eigenvalue is one, with the Boltzmann distribution as the eigenvector |1J].) This correlation 
time depends on the algorithm. The constant 7a in (|2.31| ) depends on the observable A (as 
well as on lattice size, temperature, etc.) and can be very small or zero for some observables. 
In general, r exp is more difficult to estimate precisely than r^ t . In order to determine ac- 
curately r exp , we must therefore use a suitable range of observables to establish consistency. 
In some cases, we could not obtain reliable estimates of r exp as we will see below. Another 
remark about r exp is that the longest-lived physical mode associated with it can be a mode 
whose contributions to the quantities of interest are negligible. We will discuss this point 
more fully later. 



III. NEW ALGORITHMS 
A. The loop-flip algorithm (Algorithm L) 



We now discuss the loop-flip algorithm (algorithm L). In contrast to algorithm P, 
which makes only local changes in configuration space based on local decisions, the loop- 
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flip algorithm makes non-local moves based on local decisions that are linked in a specific 
manner. The manner chosen is related to the loop-flip algorithm recently proposed by 
Evertz et al. [[| to reduce critical slowing down in simulations of the 6-vertex model. The 
formal connection of the six-vertex model with the fermion (and quantum spin) problem 
is through the shaded plaquettes for each spin assuming only 6 out of the 16 possible 
configurations. The main conceptual connection is through the recognition that the worldline 
and the six- vertex configurations fl5j can be parameterized by closed loops as a consequence 



of a zero- divergence property of the models. In the fermion case, this property is directly 
connected with the local conservation of fermions. From the loop point of view, the difference 
between two different worldline configurations must simply be one or more closed loops as 
the difference must also satisfy the zero-divergence condition. It is such differences that the 
loop-flip algorithm constructs. 

The core of our loop-flip algorithm is the "massless" case of the 6-vertex algorithm 
proposed by Evertz et al. which uses only "break-up" operations. For each spin (up or 
down), the 6 allowed plaquettes are mapped onto 3 new plaquettes that have lines drawn on 
them. Since each lattice site belongs to two shaded plaquettes, these lines (loop segments) 
are drawn so that each site is touched by one line, i.e., we have zero divergence. When these 
lines are connected, loops form, some of which are very long and wind one or more times 
around the temporal and spatial directions. For each spin, flipping the loop corresponds 
to changing electrons on the loop into holes and vice versa. In the process, the number of 
electrons can change. This change occurs when the net temporal winding number associated 
with the system changes. If the temporal winding number changes, a "sign problem" may 



occur 



The existence of just 6 allowable plaquettes for each spin does not mean the Hubbard 
model is equivalent to a combination of 6-vertex models. Several special considerations 
exist. In ( |2.13| ), the presence of the sign term is one. As we mentioned above, our use of 
the loop-flip algorithm can generate negative weights; fortunately, we found them for only 
uninteresting cases and then very rarely. A second consideration is the [/-term and //-term 
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in the Hubbard model. We take these terms into account by observing that in their absence 
the weight ( |2.15| ) factors into an up and down term, each of which maps onto a 6-vertex 
model. We apply our loop-flip algorithm to each of these pieces and then include the uy 
contribution by modifying the acceptance probability so that the detailed balance condition 
is satisfied. This modification is straightforward; our main task is specifying the loop-flip 
algorithm. 

To construct our algorithm, we first imagine that we stack the up and down spin checker- 
boards slightly above one another and focus our attention on blocks whose upper and lower 
faces are the up and down spin shaded plaquettes. (We have 36 allowed local configurations 
for each block.) Next, we define the variable cj) b which specifies the state of a block b on 
which the eight-body local interaction in ( [2.15|) is defined. For this variable, we take a pair 



of symbols, 4> b = (<f) b , 0^), each of which has one of the six allowed values of the plaquette. 
We symbolize these six values by 1,1, 2, 2, 3, and 3 and define them in Fig. |[ We then 
rewrite the Boltzmann weight ( |2.15| ) in terms of local variables defined on the shaded blocks 
as follows: 

w(c)=n^(0 6 )> (3.i) 
b 

w(<p b ) = u(4)u(4)v(4,4). (3.2) 

Our task is to generate a Markov process, in which different configurations are visited 
with a frequency proportional to the weight (|3.2| ). This can be done by cycling a Monte 
Carlo updating procedure that consists of two steps 0: The first step is the stochastic 
assignment of a label to every shaded block, and the second step is a Monte Carlo move 
using modified weights associated with these labels. 

In the first step, we assign a label tp b to each shaded block with a probability p(ifj b \<f)b) ■ 
Here, the label ip b is a pair of integers, (ip b ,ip b ), each of which may have a value 1, 2 or 
3. These labels will be related to decompositions of a plaquette into line segments. (See 
Fig. U) The probabilities are to be determined so that the detailed balance condition holds 
for the overall updating procedure and the sum rule 
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5>(^|0<,) = 1 (3.3) 

is satisfied. Generally, this assignment is an over-determined problem, but in the present 
problem, we can assign the probabilities uniquely, as we now discuss. 
To define these probabilities, we first define a new local weight 

«ty 6 (0&) = CfoW^ptyblfo), (3.4) 

where c^ b is a configuration independent renormalization constant which is to be determined 
so that ( |3.3|) holds. In the second step of the algorithm, the stochastic updating of states, a 



Monte Carlo step is performed using ( |3.4j) instead of ( |3.2|) . As long as the relationship ( |3.4| 



holds, this step together with the first step constitutes a single Monte Carlo move which 



satisfies the detailed balance condition fT7|j . 



Following the viewpoint of Kawashima and Gubernatis ||17|| , the algorithm is character- 
ized by the modified weights w^ b (4>b) since once we specify them and since the w((f>b) are 
given, the probabilistic assignment of the labels, i.e., p(ipb\4>b), is determined by ( |3.3j ) and 
ill). To show this, we define 



where v(4>j } , (f)\) is the same as in (|3.2| ) and 



0, if x = y or x = y 

u x (y) = { (3-6) 

1, otherwise 

The probability for the labeling procedure can then be written as 

pW\<h) = Pp(^M)ppM4)- (3-7) 
For a given spin, the labeling probability pp of one plaquette is easily found to be 



Pp(2|1) = 1 


-Pp(3|1) 


= Pp(2|1) 


= 1-Pp(3|1) 


= Pi, 




Pp(3|2) = 1 


-p P (l|2) 


= Pp(3|2) 


= 1-Pp(1|2) 


= P2, 




Pp(1|3) = 1 


-Pp(2|3) 


= Pp(1|3) 


= 1-Pp(2|3) 


= Ps, 


(3.8) 
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where 

1 i _|_ p T * 1 — e~ Tt 

In order to make the simulation effective, it is desirable to have an updating rule which 
guarantees that any resulting configuration is an allowed state. This is easy to do in the 
present case, since the state which can be reached by flipping a loop configuration is an 
allowed one. Actually, the modified weights fl3.6|) are chosen so that this correspondence 
holds. In order to explain what loop configurations are, we define how we construct a loop: 
For each shaded plaquette, we connect each corner site to another corner by a segment as 
depicted in Fig. [|. The connected sites form loops. We note that any configuration reached 
by flipping a loop is allowed since a flip of any segment of a loop results in an allowed local 
configuration. Flipping a loop means replacing all the electrons on the loop by holes and 
vice versa. 

If the v term in (|3.4| ) were not present, we would simply flip each loop independently with 
probability 1/2. In our case, with the v term present, each loop is flipped with a probability 

R= l/(l + e~ rA ) (3.10) 

where 

A = ^(2M-£)-±W-£), (3.11) 

where M is the total number of particles on the loop, M. is the total number of sites on 
which there are zero or two electrons, and £ is the length of the loop. We remark that 
the flip of one loop can change the value of A for the other loops. We, therefore, flip loops 
sequentially, one after the other. For N = 2 and L = 4, an example of a loop-flip is shown 
in Fig. |. 

We remark that any change of the worldlines by algorithm P can be realized by algorithm 
L with a finite probability. Additionally, any winding number can occur, and thus the 
algorithm simulates the grand canonical ensemble of the model. We can prove ergodicity 



and even a stronger statement that any state which does not violate local particle-number 
conservation can be reached with a finite probability from any other such state in one sweep 
of Monte Carlo updating. These statements are a consequence of a loop flip corresponding to 
changing the occupancy on all sites connected by the loop. Conversely, the difference between 
a pair of allowed worldlines configurations corresponds exactly to a loop configuration. Thus, 
any two allowed worldline configurations can be transformed in to each other by a single flip 
of a loop configuration. 



B. The loop-exchange algorithm (Algorithm L cx ) 

The loop-flip algorithm described above was an extension of the algorithm that Evertz 
et al. applied to the 6-vertex model; however, for the Hubbard model it has shortcomings 
which do not exist for the 6-vertex model or a quantum spin system. In particular, when U is 
large compared to T, long loops have little chance of being flipped [jnj . We can understand 
this behavior by observing that long loops decrease the acceptance probability (|3.11| ) when 



U > 0. Since we are often interested in the model with large positive U and small T, 
this can be problematic as it can lead to long auto-correlation times. In addition, we are 
also interested in non-half-filled cases. For them, we also face a similar difficulty of long 
correlation times because flipping a loop may change the total number of particles, and the 
acceptance ratio ( |3.10|) may therefore be strongly be suppressed by the /x-term (|3.11|) . 



In order to resolve these difficulties, we developed a new loop algorithm, the loop- 
exchange algorithm (algorithm L ex ). I* 1 this algorithm, two loops which have the same 
shape, but different spins, are flipped simultaneously. In other words, these flips do not 
change the coefficients of ?7-term nor //-term in (|3.11|) . Inherently, this algorithm is non- 
ergodic so it must be used with some ergodic algorithm to construct a correct Markov 
process. In the present paper, we use algorithm L for this purpose. 

We will discuss algorithm L ex as a modification of algorithm L. In algorithm L ex , all 36 
configurations of a block, except two, map uniquely onto a smaller set of labels. The rule for 
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label assignment is straightforward: First, we draw worldlines by following the ordinary rule 
described in Subsection [11 Q Then, we overlap these two plaquettes and erase all doubly 



drawn lines, i.e., worldline segments drawn at the same place on both the plaquettes. If the 
resulting picture has lines that dead-end at opposite corners of the same plaquette, these 
corners are connected by a straight line. After doing this for all blocks, we identify the 
resulting sets of overlapping worldline segments with labels and assign these labels with a 
probability of one. This labeling is illustrated in Fig. |6|. 

The two exceptional cases are those for which the rule of the label assignment is not 
unique. These cases are configurations (2,2) and (2,2). Each can have one of two labels, 
which we denote by 1 and 2. Their modified weights are defined as 

^((2, 2)) = ^((2, 2)) = w 2 ((2, 2)) = w 2 {{% 2)) = 1 (3.12) 

with all other modified weights, i.e., those W{ with the two exceptional labels but with i not 
equal to 1 or 2, vanishing. 

With the specification of all other modified weights as unity, the definition of the loop 
exchange algorithm is now complete. We also note that once the labels are chosen, the 
modified weights for all allowed loop configurations are equal and the transition probability 
of a Monte Carlo attempt is one-half for any loop, no matter how large U and \x may be. In 
terms of labeling probabilities, the algorithm L ex is characterized by 

p(l|(2,2)) =p(l|(2,2)) = 1 -tanhVt (3.13) 
p(2|(2,2)) =p(2|(2,2)) =tanh 2 rt. (3.14) 

For the other configurations, the labeling probability is 1 or as described above. 

For algorithm L ex , only an allowed state can be reached from a flipped loop. Both the 
up and down planes are treated simultaneously, whereas in algorithm L, they are treated 
independently as far as the loop construction is concerned. In algorithm L ex , therefore, only 
one set of loops exists for the two planes. The construction rule for the loops is simple: All 
we do is just connect the segments defined above, which have a one-to-one correspondence 
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to the labels. This rule gives connected loops; no open curves appear. A useful way to 
view this is as follows: The plaquettes in the right column of Fig. ^ are the result of an 
exclusive-or (XOR) operation on the overlaid plaquettes along the rows to the left. This 
XOR operation replaces the overlaid worldlines by a lattice with O's of l's at the lattice 
sites. The L ex loops connect the sites with the l's, of which there are 0, 2, or 4 per shaded 
plaquette. Thus, the loops are divergence- free. An example of a loop-exchange is shown in 
Fig. when N = 4 and L = 3. 

IV. RESULTS 

In this section, we present our numerical results for the integrated auto-correlation times 
( P-30Q associated with the average energy, electron occupancy ( |2.5|) , and the q = n, static 
charge- density Tx+ and spin-density Tx~ correlations (|2.4|). We will denote these times by 
Tfa t , r^ t , rf nt , and rf nt . We will also present results for r exp . 

In what follows, we will be concerned with various combinations of the three algorithms 
described in the previous sections. One case is the plaquette algorithm P by itself. Another 
is the combination of P and L, which we will refer to as PL. Other symbols, such as L, LL CX , 
and PLL ex , should be understood in a similar fashion. The application of any combination 
which includes L results naturally in a grand canonical and ergodic simulation, while the 
application P and L ex alone or together results in a canonical and non-ergodic simulation. 
Although we can force algorithm L to simulate the canonical ensemble, we did not do this 
unless otherwise stated. 

As we mentioned above, in some cases, especially when algorithm P was used alone, 
we were unable to estimate r^ t because the r^ t (l) versus I curve did not reach a plateau 
during the course of our rather lengthy Monte Carlo runs. In such cases, we took the largest 
available value (with still a reasonably large number bins remaining) as a practical lower 
bound on r^ t . This choice of a lower bound is justified since all the T^ t (l) versus / curves 
we computed, whether they reached the plateau or not, were non-decreasing functions. In 
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Fig. |8], we show T^ t (l) versus / curves for a typical example where using algorithm P, we were 
forced to take a lower bound, and where using algorithm PLL ex , we reached a plateau. This 
figure also illustrates that in some cases we can reduce the autocorrelation time dramatically 
by combining P with L and L ex . 

For most simulations, the length of the Monte Carlo calculation was 0.25 million Monte 
Carlo steps (MCS). In some exceptional cases, where the autocorrelation times are very long, 
we performed longer runs ||18|| . In algorithm P, a Monte Carlo step consists of a sweep across 
all unshaded plaquettes on which an attempt to flip its corners is made. In algorithms L and 
L ex , a Monte Carlo step means decomposing the whole system into loops and attempting 
to flip every loop. When several algorithms are combined, an MCS consists of one sweep 
through each algorithm. For example, in PL, one Monte Carlo step is one sweep through P 
and one sweep through L. 

In Fig. |9], various autocorrelation times for P and PLL ex are plotted for fixed value 
of y = Uj3/L = 0.5, where L is the lattice extension in imaginary time direction. From 
Fig. |S](a), (b), and (c), we see that for our new algorithm PLL ex , the autocorrelation time 
is very small, of order one, in almost all cases. After some initial increase, it is flat as a 
function of 1/T and also as a function of U. Thus, algorithm PLL ex performs very well. 

The behavior of the algorithm P, the conventional algorithm, is not as clear. At small 
(3, where we have performed very long simulations, we see that r int increases rapidly as a 
function of (3. We also notice also that as a function of U, r int increases rapidly. At larger 
/3, most of our (fairly short) runs with algorithm P did not converge, and we show only 
lower bounds for r int . It is possible that the initial increase of Ti nt with f3 actually continues 
towards larger (3. Therefore, some of the lower bounds on r int could possibly be an order of 
magnitude below the actual value. 

In short, comparing P and PLL ex , we see that at large U and small /3, the difference in 
performance between these algorithms is enormous, being more than 3 orders of magnitude 
at U = 8. As argued above, there could be an even bigger difference in performance when 
(3 becomes large. 
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Figure ^| also shows that our PLL ex algorithm is very effective in reducing the integrated 
correlation time for the SDW correlation function. The reason for this is that we can 
exchange worldlines for an up-spin and a down- spin electron in either algorithm L and L ex 
without visiting intermediate states with small weights; however, algorithm L ex is most 
effective in this regard as it was designed to do precisely this. At large U and at low 
temperature, the typical spatial worldline configuration is the one where up-spin and down- 
spin worldlines appear alternatively with small overlap. The overlap is strongly discouraged 
by a large value of U. In algorithm P, in order to exchange the positions of two worldlines, 
we need intermediate states in which the two worldlines overlap at least partially. We avoid 
these intermediate states with the algorithms L and L ex . We have observed strong SDW 
ordering of the system by looking at the average values of T\- at q = it. To show an example, 
Tx-(tt) = 0.884(1) at U = 8 and (3 = 0.5 for PLL CX whereas it becomes Tx_(tt) = 2.000(8) 
at p = 8. 

Figure |9](c) for the CDW function sharply contrasts that of Fig. |9](b) for the SDW function 
in that a significant increase in the correlation time does not occur as the temperature is 
lowered. (We again emphasize that the Ti nt displayed for algorithm P are in general lower 
bound estimates and are likely to be too small.) In fact, the correlation time is almost 
constant for (3 > 6 in all the six curves. The difference between P and PLL ex is far less 
noticeable: the correlation time for P is only a few times larger than its counterpart for 
PLL CX . Consistent with these observations is the fact that CDW order is much weaker than 
the SDW order. For example, Tx+(n) = 0.1281(5) at U = 8 and (3 = 0.5 for PLL ex whereas 
it becomes T X+ {tt) = 0.00380(2) at /3 = 8. 

The longest integrated correlation times for PLL ex are seen in Fig. ^|(d) for the average 
electron number. Since there are no fluctuations in the particle number in algorithm P, 
there is no corresponding curves for P for this quantity. However, even for PLL ex , the 
fluctuation in the particle number is very small. The variance in the particle number is 
sometimes so small that we could not obtain a reliable estimate for its correlation time. 
This undetermination is why some data points are missing in the figure, especially in the 
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region of large correlation times at lower temperatures. 

The comparison between r int and r exp in Fig. |9](d) shows that in most cases the inte- 
grated correlation time for the particle number is nearly as large as the longest mode 
correlation time r exp . This fact suggests that the longest mode in PLL ex is the mode in 
which creation and annihilation of particles are involved. Empirically, the small variance 
in the particle number and the long correlation time associated with charge creation and 
annihilation appear closely related. For example, in the case where N = 32, U = 8, \x = 0, 
1/T = 4 and N = 32, the square-root of the particle number variance (i.e., one standard 
deviation) measured by the algorithm PLL ex is 0.02. For lower temperatures, with the rest 
of parameters the same, we were unable to estimate this quantity reliably. However, the 
influence of this long mode for relevant physical quantities may be negligible because the 
change in this quantity may be so small that we can effectively consider it as a constant. 
This is the case with the long mode associated to the average particle number, since the 
average values of other physical quantities, such as energy and static SDW susceptibility, 
are not as sensitive to particle number fluctuations so this exponentially small fluctuation 
cannot appreciably affect the average value of those quantities. In this sense, the longest 
mode correlation time is not necessarily the relevant measure for the computational time, 
except if we are interested in the longest mode itself. 

On the other hand, although the creation and annihilation of particles are rather rare 
and cause long r eX pS in some cases, we found that it is still important in other cases to allow 
them to happen in order to reduce the integrated correlation times for quantities of physical 
interest. To see this, we have performed some simulations in which all the Monte Carlo 
attempts for loops whose winding number is non-vanishing (and therefore may change the 
particle number) are rejected. In Table [I], we show the correlation times for these special 
simulations in comparison with the grand canonical simulations. The result clearly shows 
the importance of creation and annihilation processes. 

As already mentioned, the present algorithm allows negative sign configurations to occur. 
We found, however, the negative sign ratio defined by [[0| 
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r=(z + - z.)/(z + + z_) (4.i) 

where Z + (ZJ) is the Boltzmann weight for the subset of configurations that are positive 
(negative), is close to unity for most cases, especially for larger system sizes. This observation 
means that the "sign problem" is not a problem for the new algorithms. In fact, the only 
cases for which we found any configurations of negative sign are the cases where N = 32, 
\i = 0.0, U = 2, y = 0.5, and (3 = 4, 6, 8, and 10. The values of T for these cases are 
0.9978(6), 0.979(4), 0.964(5) and 0.88(1). At larger values of U or higher temperatures, we 
did not observe any negative sign configurations. For smaller lattices, such as iV = 16, we 
observed a larger number of negative sign configurations, although they are again just a 
small fraction of the total. We expect that the negative sign ratio approaches to unity as 
the system size becomes large. 

The reduction in auto-correlation times brought about by algorithm L ex is most notice- 
able for the integrated correlation time for the SDW correlation function. In Table we 
list the correlation times for various combinations of the three algorithms, in the case where 
U = 4, (3 = 1.5, N = 32 and /i = 0.0, to examine the efficiency of several combinations. We 
can see that the integrated correlation time rf nt for the SDW correlation function for PLL ex 
is about 1/8 of that for PL. Since we have not carried out this type of investigation for 
a variety of physical parameters systematically, cases may exist where the efficiency of L ex 
is much more noticeable than presented here. The reason why the efficiency of L ex is most 
noticeable in the correlation time for SDW is conjectured as follows: As already argued, the 
integrated correlation time for SDW fluctuations is related to the exchange of two worldlines 
of opposite spins. Although we can achieve this exchange through either algorithm P or 
L, we inevitably encounter unpleasant intermediate states in the case of P. Algorithm L 
may suffer from similar difficulty because in forming loops in either the up-spin plane or the 
down-spin plane it does not use the information about the configuration in the other plane. 
Therefore, in some cases, the attempted movement of the worldline results in larger overlaps 
between down-spin worldlines and up-spin ones, and the move is improbable. On the other 
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hand, the algorithm L ex is reasonably free from this kind of difficulty. 

The acceptance ratios of the various algorithms also shows a wide range of behavior. In 
Table |T|, we list the acceptance ratios for some typical cases. The acceptance ratio r for 
one MCS is defined as 

_ [The number of flipped sites] 

n x [The number of sites in the system] ' 

where n = 2 for P and n = 1 for L and L ex . The integer n is the number of times a 
site is flipped in a MCS. For the algorithm P, this definition is equivalent to the ordinary 
acceptance ratio defined as the number of accepted attempts divided by the total number 



of attempts. The numbers in Table |T| are averaged over all the Monte Carlo steps. We 
see that the acceptance ratio for P and L are strongly dependent on (3 and U and become 
very small as the value of U increases. In contrast, the acceptance ratio for L ex is much less 
parameter dependent and only moderately increases to 1/2 as U increases. 

Some results on the quarter-filled case are presented in Table [IV]. In this case, in order 
to compare the results to those of the algorithm P on a roughly equal basis, we adjusted the 
chemical potential \x for the algorithm PLL ex so that the resulting average particle number 
becomes close to 0.5. The difference between the algorithms is less striking than that in 
the half-filled case although we can still see considerable reduction in the correlation time. 
We remark that for PLL ex the autocorrelations time for 1/4- filling are comparable to those 
at 1/2-filling. Since the overhead of PLL ex , relative to P, is approximately a factor of 10 
larger, the quarter-filled case is near a marginal filling where the overhead and the reduction 
of correlation time balances. 



V. CONCLUSION 

We presented two new algorithms for worldline Monte Carlo simulations of electron sys- 
tems and demonstrated their efficiency in the case of the one-dimensional, repulsive Hubbard 
model. Their extension to higher dimensions is straightforward. The first algorithm, which 
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we call the loop-flip algorithm, is an extension to fermion systems of a loop-flip algorithm for 
6- vertex model recently proposed by Evertz et al. Our new algorithm enables us to sim- 
ulate the grand canonical ensemble while the traditional plaquette flip algorithm simulates 
in the canonical ensemble. We also found that switching to grand canonical simulation often 
reduces the autocorrelation time of the simulation dramatically. The second algorithm is the 
loop-exchange algorithm in which up-spin and down-spin worldline segments exchange their 
locations. Since all worldline movement in the loop exchange algorithm is unaccompanied 
by changes in the chemical potential term or the [/-term in the Hamiltonian, we expected 
that this algorithm would be effective when the long correlation time is due to difficulties in 
exchanging worldlines. Consistent with this expectation, we found that the loop exchange 
algorithm is especially effective in reducing the autocorrelation time of the SD W correlation 
function when the value of U is large. These algorithms generalize the cluster algorithms || 
recently developed mainly to reduce long auto-correlation times accompanying simulations 
of critical phenomena in classical spin systems. In contrast to the usual use of cluster al- 
gorithms, we were concerned with reducing the inherently long auto-correlation times that 
occur in the worldline method even when the physical system is far removed from any known 
finite-temperature phase transition. 

At the lowest simulated temperatures, we observed reductions of the autocorrelation 
time, measured in the units of Monte Carlo steps, as large as three orders of magnitudes. 
One MCS in the algorithm PLL ex , however, takes from 8 to 10 times longer in CPU time 
than one step in algorithm P so the above reduction of autocorrelation time gives a reduction 
in CPU time by a factor as large as two orders of magnitude. 

The estimates of improved efficiency are cautious ones. In most cases we were able 
only to estimate a lower bound for the autocorrelation times associated with the plaquette 
algorithm. It is not unreasonable in some casses to expect the actual autocorrelation times to 
be an order of magnitude larger. Additionally, it should be possible to improve the efficiency 
our our implementation of the loop algorithms. We used a highly optimized code for the 
plaquette algorithm but used unoptimized codes for the loop algorithms. Further, we used 
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ordinary workstations for all the calculations. Parallelizing these computations is possible 
and will most likely be needed when the present algorithms are applied more challenging 
problems, such as those in higher dimensions. 

It is natural to ask about the extensions of the present methods to higher dimensions 
and other models. When one changes the problem, it is perhaps best to ask whether one 
should also change the methods. The algorithms presented were designed for the one- 
dimensional repulsive Hubbard model but can be viewed from the perspective of a general 
approach to developing cluster algorithms [ 17fl . Tailoring the algorithms, when extended to 



other problems, might be possible. Still, the direct use of our new methods to many other 
problems is possible. How efficient will be this use is what needs investigation. For many 
cases, their use should be more efficient than the standard algorithm. 

The most obvious extension would be to the two-dimensional repulsive Hubbard model. 
The standard worldline algorithm suffers such severe sign problems in two- dimensions that is 
often almost pointless to use it. We have no reason to believe our algorithms reduce the sign 
problem. Our best expectation would be the achievement of a significant enough reduction 
in the variance of the measured physical quantities so simulations could be performed over 



a limited range of parameters. Wiese |2D[ has reported the use an extension of the massless 
version of the Evertz et al. loop-flip algorithm on the two-dimensional, free fermion problem, 
and the few high temperature results he reported show seemingly accurate values for the 
average electron occupancy even in the presence of a severe sign problem. When U = fi = 
our loop-flip algorithm for fermions reduces to his if we only attempt to flip one loop at 
each Monte Carlo step. The potential effectiveness of these methods for fermions problems 
in two or more dimensions, while appearing not especially promising, needs more study. 

Of our two algorithms, the loop-exchange method is the one most specific to the repulsive 
Hubbard and related models. If U < 0, an algorithm, which breaks-up worldines wanting to 
be on top of one another, would be more appropriate and should be possible to construct. For 
the extended Hubbard model, which has a Coulomb term between electrons on neighboring 
sites, an additional specific construct might also be necessary. The loop-exchange algorithm, 
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because is addresses the hopping part of the problem, will change little from problem to 
problem. 

Loop algorithm L, in fact, can be almost directly applied to the spin-1/2 quantum spin 
chains. This applicability follows from the similarity the path-integral representation of the 
spin-1/2 quantum spin system has with spinless fermion model |TT|. Use of these algorithms 



in two-dimensions also appears quite direct and desirable. Wiese and Ying |21j , using what 
turns out to be the massless version of of the Evertz et al. method, studied the spin-1/2 
Heisenberg anti-ferromagnet with success, although they do not report efficiency figures of 
merit. Recently, we developed cluster algorithms for spin of arbitrary magnitude |22] and 
are in the process of testing these methods. We will report our results elsewhere. 

In closing, we comment that the total temporal and spatial winding numbers are both 
physically important. The total temporal winding number is the total number of particles 
and the average of the square of its fluctuations is related to the charge compressibility of the 
system. The average of the square of the fluctuations of the total spatial winding number 
is directly related to the dc conductivity |2"3 |. In the conventional worldline Monte Carlo, 



i.e., algorithm P, the total winding number of either type is conserved. To estimate the 
charge compressibility and dc conductivity, we have to do some additional manipualtions. 
In the new algorithms, these manipulations seem unnecessary because the total winding 
numbers of both kinds can change simulaneously, individually, or not at all. We can see the 
changes by noting that in algorithm L flipping a single loop of spatial winding number w 
changes the total spatial winding number by w, and that a loop with any spatial winding 
number can form with a finite probability as long as it is allowed in the size of the system 
under consideration. A similar remark can be made with respect to the temporal winding 
number. Therefore, we can directly estimate the above-mentioned quantities. We have 
yet to investigate the efficiency of this possibility because in this paper we were mainly 
concerned with overall algorithmic issues and development. An additional important point 
is that similar possibilities will also exist for the relevant extension of the loop algorithms 
to bosons and quantum spin systems. 
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TABLES 

TABLE I. Auto-correlation times for U = 8, L = 8, l/T = 1, N = 16, and p = 0. 



Algorithm 


tnt 


T N 
int 


tnt 


int 


T~exp 


PL (canonical) 


>46 


OO 


13(2) 


60(5) 


a 


PL (grand canonical) 


1.7(2) 


0.89(3) 


0.81(1) 


10.8(9) 


19(7) 


PLL ex (canonical) 


>28 


oo 


10.7(1) 


1.80(5) 


a 


PLL CX (grand canonical) 


0.99(4) 


0.92(2) 


0.718(7) 


0.509(8) 


1.75(5) 



a Unable to measure. 



TABLE II. Auto-correlation times for U = 4, L = 12, l/T = 1.5, N = 32, and fj, = 0. 



Algorithm 



P 102(7) oo 17(1) 45(2) - a 

L 4.2(3) 1.22(6) 0.73(3) 5.6(5) 10(1) 

PL 2.0(2) 1.20(3) 0.63(2) 3.8(1) 5.1(5) 

PL CX >33 oo >5.2 >1.7 — a 

LL CX 3.1(2) 1.28(4) 0.711(4) 0.60(3) 5.9(8) 

PLL CX 1.26(2) 1.28(8) 0.63(3) 0.525(4) 3.4(6) 
a Unable to measure. 
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TABLE III. Acceptance ratios for N = 32, U//3L = 1, and ji = 0. 



P Algorithm U = 2 [7 = 4 U = 8 

2 P 0.146 0.071 0.020 

L 0.208 0.074 0.012 

L cx 0.349 0.413 0.478 

6 P 0.157 0.081 0.028 

L 0.105 0.050 0.014 

L cx 0.355 0.415 0.471 



TABLE IV. Auto-correlation times for N = 16, L = 8, and U = 8, at quarter filling or nearly 
quarter filling. 



Algorithm 


l/T 


(n) 


int 


T N 

int 


int 


T s 

int 


T~exp 


P 


1.0 


0.5 


12.4(9) 


CO 


6.8(4) 


9.9(5) 


a 


P 


2.0 


0.5 


22(5) 


CO 


8.1(3) 


10.6(8) 


a 


P 


4.0 


0.5 


28(2) 


CO 


10.0(7) 


11(1) 


a 


PLL CX 


1.0 


0.4975(2) 


1.12(5) 


1.24(6) 


0.55(1) 


0.514(5) 


1.21(3) 


PLL ex 


2.0 


0.4940(6) 


1.51(5) 


3.3(2) 


0.786(9) 


0.60(2) 


4.0(1) 


PLL ex 


4.0 


0.4824(3) 


2.0(2) 


34(4) 


1.49(3) 


1.1(1) 


35(2) 



a Unable to measure. 
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FIGURES 

FIG. 1. An example of the space-time checkerboard for a single component of spin. Here, 4 
electrons of the same spin are on 8 lattice sites, which are along the horizontal direction. In the 
vertical direction are 2L = 8 imaginary-time steps. The symbols Si along the left edge designate 
the many-body state at that imaginary-time step. The ket IS2) = 1 1 , 1, 0, 0, 1, 1, 0, 0), for example. 

FIG. 2. The six allowed shaded plaquettes. The values of <p are our symbols for the depicted 
plaquette. 

FIG. 3. The allowed movements of worldline segments back and forth across a unshaded 
plaquette. 

FIG. 4. The labeling probability for the algorithm L. In the top row, the correspondence 
between labels and loop decompositions is shown. In the leftmost column, the correspondence 
between symbols (1,1,2,2,3 and 3 ) and states are shown. Solid circles stand for particles while 
open circles for holes. 

FIG. 5. An example of a loop flip: (a) a configuration with a single fermion, (b) a possible 
decomposition resulting in three loops, and (c) a possible worldline configuration resulting from 
the loop flips. Depicted in (c) is a two fermion configuration with a temporal winding number of 
two and a spatial winding number of one. Between (a) and (c), a sign change occurred. 

FIG. 6. Correspondence between local states and loop segments in the algorithm L cx . The 
states are depicted in terms of worldline segments. Solid lines are for down-spin particles and 
dashed lines are for up-spin particles. 

FIG. 7. An example of a loop-exchange. 

FIG. 8. The bin-length length dependence of T^ t (l) for the SDW fluctuation for both the P 
and PLL CX algorithms, (a) A case where r^ t (7) does not converge (N = 32, L = 160, U = 8,/3 = 8 
and [i = 0.0). (b) A case where T^ t (l) converges. 
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FIG. 9. The integrated auto-correlation times for (a) energy, (b) SDW fluctuation, (c) CDW 
fluctuation and (d) the number of particles, in the case of N = 32, /j, = 0.0 and U(3/L = 0.5. The 
double symbols and dashed curves show (most likely poor) lower bounds for the correlation times, 
not the correlation times themselves. The curves are computed for several values of U. 
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